Search CORE

281 research outputs found

ncRNA orthologies in the vertebrate lineage.

Author: Flicek P
Gordon L
Herrero J
Muffato M
Pignatelli M
Vilella AJ
White S
Publication venue
Publication date: 15/03/2016
Field of study

Annotation of orthologous and paralogous genes is necessary for many aspects of evolutionary analysis. Methods to infer these homology relationships have traditionally focused on protein-coding genes and evolutionary models used by these methods normally assume the positions in the protein evolve independently. However, as our appreciation for the roles of non-coding RNA genes has increased, consistently annotated sets of orthologous and paralogous ncRNA genes are increasingly needed. At the same time, methods such as PHASE or RAxML have implemented substitution models that consider pairs of sites to enable proper modelling of the loops and other features of RNA secondary structure. Here, we present a comprehensive analysis pipeline for the automatic detection of orthologues and paralogues for ncRNA genes. We focus on gene families represented in Rfam and for which a specific covariance model is provided. For each family ncRNA genes found in all Ensembl species are aligned using Infernal, and several trees are built using different substitution models. In parallel, a genomic alignment that includes the ncRNA genes and their flanking sequence regions is built with PRANK. This alignment is used to create two additional phylogenetic trees using the neighbour-joining (NJ) and maximum-likelihood (ML) methods. The trees arising from both the ncRNA and genomic alignments are merged using TreeBeST, which reconciles them with the species tree in order to identify speciation and duplication events. The final tree is used to infer the orthologues and paralogues following Fitch's definition. We also determine gene gain and loss events for each family using CAFE. All data are accessible through the Ensembl Comparative Genomics ('Compara') API, on our FTP site and are fully integrated in the Ensembl genome browser, where they can be accessed in a user-friendly manner.Database URL: http://www.ensembl.org

UCL Discovery

PubMed Central

Evolutionary Sequence Analysis and Visualization with Wasabi

Author: A Löytynoja
A Löytynoja
A Veidenberg
A Yates
AJ Vilella
B Paten
BR Baum
DR Maddison
DR Zerbino
J Huerta-Cepas
J Zhang
K Katoh
MA Larkin
MN Price
MV Han
S Kumar
YS Cho
Z Yang
Z Yang
Publication venue: Humana press
Publication date: 01/01/2021
Field of study

Wasabi is an open-source, web-based graphical environment for evolutionary sequence analysis and visualization, designed to work with multiple sequence alignments within their phylogenetic context. Its interactive user interface provides convenient access to external data sources and computational tools and is easily extendable with custom tools and pipelines using a plugin system. Wasabi stores intermediate editing and analysis steps as workflow histories and provides direct-access web links to datasets, allowing for reproducible, collaborative research, and easy dissemination of the results. In addition to shared analyses and installation-free usage, the web-based design allows Wasabi to be run as a cross-platform, stand-alone application and makes its integration to other web services straightforward. This chapter gives a detailed description and guidelines for the use of Wasabi's analysis environment. Example use cases will give step-by-step instructions for practical application of the public Wasabi, from quick data visualization to branched analysis pipelines and publishing of results. We end with a brief discussion of advanced usage of Wasabi, including command-line communication, interface extension, offline usage, and integration to local and public web services.Peer reviewe

Crossref

Helsingin yliopiston digitaalinen arkisto

Extensive Copy-Number Variation of Young Genes across Stickleback Populations

Author: A Abyzov
A Alexa
A Conesa
A Hussain
AJ Iafrate
AJ Sharp
AJ Vilella
AR Boyko
AR Quinlan
B Guo
BE Deagle
C Eizaguirre
C Eizaguirre
Christophe Eizaguirre
CL McGrath
CL Peichel
D Bryant
D Juan
D Tautz
DE Cook
DH Huson
DJ Turner
DR Schrider
DR Schrider
DR Zerbino
E Gazave
E Proux
Erich Bornberg-Bauer
FA Kondrashov
FC Jones
Frédéric J. J. Chain
G Gibson
G Orti
GC Conant
GH Perry
GH Perry
GM Cooper
H Kehrer-Sawatzki
H Li
Irene E. Samonte
J Sebat
JA Fawcett
Jianzhi Zhang
JJ Emerson
JK Colbourne
JO Korbel
JO Korbel
K Chen
K Khalturin
K Ye
KJ Lipinski
KJ Livak
KM Teshima
KM Wegner
L Xu
LC Hsing
LR Saraiva
M Hiraiwa
M Long
M Long
M Lynch
M Lynch
M Milinski
M Roesti
MA DePristo
Mahesh Panchal
Manfred Milinski
Martin Kalbe
Monika Stoll
N Ghanem
P Danecek
P Flicek
P Sjödin
PA Hohenlohe
PGD Feulner
PH Sudmant
Philine G. D. Feulner
PM Kim
R Redon
RC Iskow
S Moretti
S Sawyer
SF Altschul
SH Williamson
SM Waszak
SR Browning
T Marques-Bonet
T Rausch
TD Schmittgen
Thorsten B. H. Reusch
Tobias L. Lenz
V Guryev
V Katju
V Katju
V Ranwez
X Huang
Y Hashiguchi
Y Hashiguchi
Y Zheng
YE Zhang
YF Chan
Z Yang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2014
Field of study

MM received funding from the Max Planck innovation funds for this project. PGDF was supported by a Marie Curie European Reintegration Grant (proposal nr 270891). CE was supported by German Science Foundation grants (DFG, EI 841/4-1 and EI 841/6-1). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript

OceanRep

Crossref

Directory of Open Access Journals

PubMed Central

Queen Mary Research Online

Bern Open Repository and Information System (BORIS)

MPG.PuRe

FigShare

Ultra-fast sequence clustering from similarity networks with SiLiX

Author: A Krishnamurthy
AJ Enright
AJ Vilella
AY Signorovitch
F Servant
H Li
HJ Atkinson
I Katriel
J Ruan
JL Boore
JM Joseph
KD Pruitt
Laurent Duret
MH Alsuwaiyel
PK Wall
PS Dehal
R Petryszak
R Tarjan
RD Finn
RE Tarjan
S Hartmann
S Hunter
S Penel
S Vishwanathan
SF Altschul
Simon Penel
SK Das
T Meinel
T Wittkop
Vincent Miele
Y Bramoulle
Y Han
Y Loewenstein
Y Tian
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background The number of gene sequences that are available for comparative genomics approaches is increasing extremely quickly. A current challenge is to be able to handle this huge amount of sequences in order to build families of homologous sequences in a reasonable time. Results We present the software package <monospace>SiLiX</monospace> that implements a novel method which reconsiders single linkage clustering with a graph theoretical approach. A parallel version of the algorithms is also presented. As a demonstration of the ability of our software, we clustered more than 3 millions sequences from about 2 billion BLAST hits in 7 minutes, with a high clustering quality, both in terms of sensitivity and specificity. Conclusions Comparing state-of-the-art software, <monospace>SiLiX</monospace> presents the best up-to-date capabilities to face the problem of clustering large collections of sequences. <monospace>SiLiX</monospace> is freely available at <url>http://lbbe.univ-lyon1.fr/SiLiX</url>.</p

Crossref

Directory of Open Access Journals

INRIA a CCSD electronic archive server

PubMed Central

HAL Descartes

Ortho2ExpressMatrix—a web server that interprets cross-species gene expression data by gene family information

Author: A Krause
A Krause
A Valencia
AC Berglund
AJ Enright
AJ Enright
AJ Vilella
Andreas H Ludewig
BY Liao
C Frech
EL Sonnhammer
EV Koonin
G Ostlund
H Edwards
H Parkinson
HS Le
I Rivals
J Michaud
KI Goh
L Huminiecki
M Kanehisa
M Kapushesky
M Pellegrini
M Remm
Michal R Schweiger
P Flicek
Ralf Herwig
Ramu Chenna
RC Friedman
RD Finn
RL Tatusov
S Abhiman
S Griffiths-Jones
S Haider
SF Altschul
Sylvia Krobitsch
T Barrett
T Domazet-Loso
T Meinel
T Meinel
Thomas Meinel
TJ Hubbard
TW Harris
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background The study of gene families is pivotal for the understanding of gene evolution across different organisms and such phylogenetic background is often used to infer biochemical functions of genes. Modern high-throughput experiments offer the possibility to analyze the entire transcriptome of an organism; however, it is often difficult to deduct functional information from that data. Results To improve functional interpretation of gene expression we introduce Ortho2ExpressMatrix, a novel tool that integrates complex gene family information, computed from sequence similarity, with comparative gene expression profiles of two pre-selected biological objects: gene families are displayed with two-dimensional matrices. Parameters of the tool are object type (two organisms, two individuals, two tissues, etc.), type of computational gene family inference, experimental meta-data, microarray platform, gene annotation level and genome build. Family information in Ortho2ExpressMatrix bases on computationally different protein family approaches such as EnsemblCompara, InParanoid, SYSTERS and Ensembl Family. Currently, respective all-against-all associations are available for five species: human, mouse, worm, fruit fly and yeast. Additionally, microRNA expression can be examined with respect to miRBase or TargetScan families. The visualization, which is typical for Ortho2ExpressMatrix, is performed as matrix view that displays functional traits of genes (differential expression) as well as sequence similarity of protein family members (BLAST e-values) in colour codes. Such translations are intended to facilitate the user's perception of the research object. Conclusions Ortho2ExpressMatrix integrates gene family information with genome-wide expression data in order to enhance functional interpretation of high-throughput analyses on diseases, environmental factors, or genetic modification or compound treatment experiments. The tool explores differential gene expression in the light of orthology, paralogy and structure of gene families up to the point of ambiguity analyses. Results can be used for filtering and prioritization in functional genomic, biomedical and systems biology applications. The web server is freely accessible at <url>http://bioinf-data.charite.de/o2em/cgi-bin/o2em.pl</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

MPG.PuRe

Gene Ontology: Pitfalls, Biases, and Remedies.

Author: A Schlicker
AF Baas
AJ Vilella
AK Rider
AM Altenhoff
AM Altenhoff
AM Schnoes
C Dessimoz
C Hass
D Binns
EL Clarke
H Mi
JF Granada
JL Sevilla
M Mistry
MG Mason
N Škunca
N Škunca
NL Nehrt
P Gaudet
PD Thomas
PJ Bickel
RP Huntley
RP Huntley
RP Huntley
SY Rhee
T. Gene and Ontology Consortium
Y Jiang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 04/02/2016
Field of study

The Gene Ontology (GO) is a formidable resource, but there are several considerations about it that are essential to understand the data and interpret it correctly. The GO is sufficiently simple that it can be used without deep understanding of its structure or how it is developed, which is both a strength and a weakness. In this chapter, we discuss some common misinterpretations of the ontology and the annotations. A better understanding of the pitfalls and the biases in the GO should help users make the most of this very rich resource. We also review some of the misconceptions and misleading assumptions commonly made about GO, including the effect of data incompleteness, the importance of annotation qualifiers, and the transitivity or lack thereof associated with different ontology relations. We also discuss several biases that can confound aggregate analyses such as gene enrichment analyses. For each of these pitfalls and biases, we suggest remedies and best practices

arXiv.org e-Print Archive

Crossref

Springer - Publisher Connector

Serveur académique lausannois

UCL Discovery

Database: The Journal of Biological Databases and Curation

Author: Amode R
Beal K
Brent S
Fitzgerald S
Flicek P
Gordon L
Herrero J
Kulesha E
Muffato M
Pignatelli M
Searle SM
Spooner W
Vilella AJ
Yates A
Publication venue
Publication date: 01/01/2016
Field of study

Evolution provides the unifying framework with which to understand biology. The coherent investigation of genic and genomic data often requires comparative genomics analyses based on whole-genome alignments, sets of homologous genes and other relevant datasets in order to evaluate and answer evolutionary-related questions. However, the complexity and computational requirements of producing such data are substantial: this has led to only a small number of reference resources that are used for most comparative analyses. The Ensembl comparative genomics resources are one such reference set that facilitates comprehensive and reproducible analysis of chordate genome data. Ensembl computes pairwise and multiple whole-genome alignments from which large-scale synteny, per-base conservation scores and constrained elements are obtained. Gene alignments are used to define Ensembl Protein Families, GeneTrees and homologies for both protein-coding and non-coding RNA genes. These resources are updated frequently and have a consistent informatics infrastructure and data presentation across all supported species. Specialized web-based visualizations are also available including synteny displays, collapsible gene tree plots, a gene family locator and different alignment views. The Ensembl comparative genomics infrastructure is extensively reused for the analysis of non-vertebrate species by other projects including Ensembl Genomes and Gramene and much of the information here is relevant to these projects. The consistency of the annotation across species and the focus on vertebrates makes Ensembl an ideal system to perform and support vertebrate comparative genomic analyses. We use robust software and pipelines to produce reference comparative data and make it freely available.Database URL: http://www.ensembl.org

UCL Discovery

Quality of Computationally Inferred Gene Ontology Annotations

Author: A Bairoch
A del Pozo
Adrian Altenhoff
AJ Vilella
B Jin
C Blaschke
CE Jones
Christophe Dessimoz
D Barrell
DP Hill
E Camon
EB Camon
ES Julfayev
F Supek
G Alterovitz
H Wickham
H Wickham
I Yeh
L du Plessis
Lars Juhl Jensen
ME Dolan
Nives Škunca
P Gaudet
R Rentzsch
S Benabderrahmane
S Hunter
S Leonelli
S Maekawa
S Meng
T Lima
TJ Buza
W-C Wong
WA Baumgartner
Publication venue: Public Library of Science
Publication date: 01/05/2012
Field of study

Gene Ontology (GO) has established itself as the undisputed standard for protein function annotation. Most annotations are inferred electronically, i.e. without individual curator supervision, but they are widely considered unreliable. At the same time, we crucially depend on those automated annotations, as most newly sequenced genomes are non-model organisms. Here, we introduce a methodology to systematically and quantitatively evaluate electronic annotations. By exploiting changes in successive releases of the UniProt Gene Ontology Annotation database, we assessed the quality of electronic annotations in terms of specificity, reliability, and coverage. Overall, we not only found that electronic annotations have significantly improved in recent years, but also that their reliability now rivals that of annotations inferred by curators when they use evidence other than experiments from primary literature. This work provides the means to identify the subset of electronic annotations that can be relied upon—an important outcome given that >98% of all annotations are inferred without direct curation

Public Library of Science (PLOS)

Repository for Publications and Research Data

Crossref

Directory of Open Access Journals

PubMed Central

UCL Discovery

FigShare

MSOAR 2.0: Incorporating tandem duplications into ortholog assignment based on genome rearrangement

Author: A Alexeyenko
A Kuzniar
AJ Enright
AJ Vilella
C Chauve
C Notredame
D Sankoff
F Mao
F Wu
Guanqun Shi
H Kishino
H Li
H Wain
J Felsenstein
J Felsenstein
J Zhang
JP Huelsenbeck
K Katoh
K Katoh
L Goodstadt
L Li
Liqing Zhang
M Blanchette
M Hurles
M Remm
M Semon
M Suyama
MD Rasmussen
O Gascuel
P Pevzner
PN Hess
R Chenna
R Friedman
R Sharan
RL Tatusov
S Bandyopadhyay
S Guindon
S Hannenhalli
S Maere
S Ohno
SF Altschul
Tao Jiang
V Shoja
WJ Kent
WM Fitch
X Chen
Z Fu
Z Yang
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Ortholog assignment is a critical and fundamental problem in comparative genomics, since orthologs are considered to be functional counterparts in different species and can be used to infer molecular functions of one species from those of other species. MSOAR is a recently developed high-throughput system for assigning one-to-one orthologs between closely related species on a genome scale. It attempts to reconstruct the evolutionary history of input genomes in terms of genome rearrangement and gene duplication events. It assumes that a gene duplication event inserts a duplicated gene into the genome of interest at a random location (<it>i.e.</it>, the random duplication model). However, in practice, biologists believe that genes are often duplicated by tandem duplications, where a duplicated gene is located next to the original copy (<it>i.e.</it>, the tandem duplication model). Results In this paper, we develop MSOAR 2.0, an improved system for one-to-one ortholog assignment. For a pair of input genomes, the system first focuses on the tandemly duplicated genes of each genome and tries to identify among them those that were duplicated after the speciation (<it>i.e.</it>, the so-called inparalogs), using a simple phylogenetic tree reconciliation method. For each such set of tandemly duplicated inparalogs, all but one gene will be deleted from the concerned genome (because they cannot possibly appear in any one-to-one ortholog pairs), and MSOAR is invoked. Using both simulated and real data experiments, we show that MSOAR 2.0 is able to achieve a better sensitivity and specificity than MSOAR. In comparison with the well-known genome-scale ortholog assignment tool InParanoid, Ensembl ortholog database, and the orthology information extracted from the well-known whole-genome multiple alignment program MultiZ, MSOAR 2.0 shows the highest sensitivity. Although the specificity of MSOAR 2.0 is slightly worse than that of InParanoid in the real data experiments, it is actually better than that of InParanoid in the simulation tests. Conclusions Our preliminary experimental results demonstrate that MSOAR 2.0 is a highly accurate tool for one-to-one ortholog assignment between closely related genomes. The software is available to the public for free and included as online supplementary material.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

Using WormBase: A Genome Biology Resource for Caenorhabditis elegans and Related Nematodes

Author: A Kalderimis
A Mitchell
AG Alexander
AJ Bretscher
AJ Vilella
C Camacho
C Trapnell
C Trapnell
D Angeles-Albores
DB Rhee
E Culetto
G Schindelman
Gene Ontology Consortium
H Li
H Motenko
I Greenwald
I Lee
I Lee
J Giacomotto
J Li
J Zheng
J-F Rual
JS Amberger
K-W Park
KL Howe
LD Stein
LM Schriml
LP O’Reilly
M Artal-Sanz
MB Gerstein
ME Skinner
OE Blacque
P Gaudet
R Balakrishnan
R Lyne
R O’Hagan
RC Edgar
RD Finn
RN Smith
RP Huntley
RP Huntley
RS Kamath
RYN Lee
S Burge
S Contrino
S Powell
S-J Lee
SF Altschul
SF Altschul
The Gene Ontology Consortium
TW Harris
W Zhong
WA Kibbe
WJ Kent
Y Nakamura
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 15/05/2018
Field of study

WormBase (www.wormbase.org) provides the nematode research community with a centralized database for information pertaining to nematode genes and genomes. As more nematode genome sequences are becoming available and as richer data sets are published, WormBase strives to maintain updated information, displays, and services to facilitate efficient access to and understanding of the knowledge generated by the published nematode genetics literature. This chapter aims to provide an explanation of how to use basic features of WormBase, new features, and some commonly used tools and data queries. Explanations of the curated data and step-by-step instructions of how to access the data via the WormBase website and available data mining tools are provided

Crossref

Caltech Authors